A new method for making objective probabilistic climate 
forecasts from numerical climate models based on Jeffreys' Prior 
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Abstract 

We argue that it would be desirable to use Jeffreys' priors in the construction of numerical model 
based probabilistic climate forecasts, in order that those forecasts could be argued to be objective. 
Hitherto, this has been considered computationally unfeasible. We propose an approximation that 
we believe makes it feasible, and derive closed-form expressions for various simple cases. 

1 Introduction 

There are many reasons for trying to predict future climate, ranging from the assessment of insurance 
risk to the determination of appropriate government policy. These different uses of climate forecasts need 
predictions of different aspects of climate, on different time-scales, and this has led to the use of a range 
of different methods for making climate predictions. For instance, the assessment of insurance risk is 
typically based on predictions of extreme climate over lead times of just a few years, and such predictions 
are generally made using an appropriate blend of statistical and numerical methods. The planning of 
government policy, on the other hand, is typically based on predictions of mean climate over lead times 
of many decades, and such predictions are gener ally made usin g numerical climate models, such as those 
described in the IPCC fourth assessment report (|lPCCl . l2007l) . 

In both statistical and numerical climate forecasting, attempts are being made to understand and incor- 
porate all sources of uncertainty into forecasts. If we split uncertainty into aleatoric (irreducible) and 
epistemic (potentially reducible) components, then estimating the aleatoric uncertainty (often known as 
randomness or variabihty in statistical modelling, and initial condition uncertainty in numerical climate 
modelling) is typically the easier of the two. In statistical model climate predictions aleatoric uncertainty 
can be estimated directly from data, and in numerical model climate predictions it can be estimated by 
creating initial condition ensembles. Estimating epistemic uncertainty (often separated into model and 
parameter uncertainty) is typically more difficult. In statistical modelling Bayesian methods can be used 
to estimate parameter uncertainty, and various different methods have been proposed for incorporating 
model uncertainty (although none are particularly satisfactory). In numerical modelling a number of 
ap proaches for e stima ti ng parameter uncertain ty ha ve been proposed a nd tested, such as those described 
in Frame et al. I (|2005l ). iTomassini et"an (|2007l) and I Allen et"ai1 (l2009l). and mod e l unce rtainty has been 



estimated by using multimodel ensembles, such as that described in lMeehl et al.l (|2007r ). 

In this article we consider the estimation of the parameter uncertainty component of the epistemic 
uncertainty in numerical climate models. Methods pro posed at thi s poin t have various advantages and 
disadvantages. The classical statistical methods used in Allen et al. ( 20091 ) have the advantage that they 
avoid the use of subjective priors, but they do not give a method for producing a probabilistic forecast, 
and uncertainty is only represe nted by confidence i nterv als around best guess pr edictions. The subjective 
Bayesian methods described in lFrame et al.l ( 20051 ) and Tomassini et al. ( 2007 ) do produce probabilistic 
forecasts, but the use of subjective priors introduces arbitrariness into the forecast. This arbitrariness 
is fine if the users of the forecast are prepared to accept the priors that have been used. On the other 
hand, it would be perfectly reasonable not to accept the proposed priors, and the resulting forecast, since 
they are essentially arbitrary rather than scientifically determined. Such forecasts can never, therefore, 
be expected to lead to much of a consensus about likely future climate states, unless the data starts to 
overwhelm the prior (which appears not to be the case for now). For policy making this is unfortunate, 
since lack of consensus on the underlying forecasts is likely to lead to lack of consensus on the appropriate 
course of action. Another disadvantage of subjective priors is that forecasts made using such priors can 



* Correspondence email: Stephen. jewsonSrms. com 



never be backtested in an honest way, since it would never be possible to argue that the prior had not 
been formulated using data from the testing period. With climate forecasts based on subjective priors 
it will never be possible, therefore, to make the standard mathematical modelling argument that one 
should believe forecasts of the future because forecasts of the past performed well. Instead, the belief 
(or lack of belief) that forecasts of the future are likely to be accurate has to be based solely on faith in 
the modelling process (or lack of faith), rather than a combination of faith in the modelling process and 
empirical demonstration of predictive ability. 

To make up for this lack of methods for producing non-arbitrary probabilistic climate forecasts we are 
embarking on an attempt to understand how to use Jeffreys' Priors to produce probabilistic climate 
forecasts. Jeffreys' Priors are the standard conventional prior used in situations where it is preferable to 
avoid including subjective information. They thus offer the hope of being able to achieve a greater level 
of consensus among scientists as to the distribution of future climate states. In addition, they allow for 
the possibility that climate models could be back-tested in an honest way, which, if such back-testing 
indicates that climate models can make good out-of-sample forecasts, should lead to greater confidence 
in climate model predictions of the future. 

Jeffreys' Priors are already being used in statistical climate forecasts used in industry ( Jewsonl . |2008[) . 
One of the reasons that they have not been used to date in numerical mo del forecasts is that the y have 
been considered to be too computationally demanding (see the comments in lTomassini et all ( 20071 ). page 
1243). We are hoping to be able to prove that this is not the case, through the use of judicious approx- 
imations, and through the use of distributed computing available as part of the climateprediction.net 
project. 

In section [2] below we introduce Jeffreys' Priors. In section [3] we then discuss an approximation that can 
be made when calculating Jeffreys' Priors for numerical climate models, that we believe renders them 
practical. Finally in section [4] we summarise and discuss our findings. 



2 Jeffreys' Priors 

We will work within a notational framework in which we have historical climate data x, and we are trying 
to make a probabilistic prediction of future climate data y. We write this prediction as p{y\x). Given a 
model for this probability distribution (which could, at this point, be a statistical model or a numerical 
model ensemble), we can make a prediction using: 

p{y\x) = j p{y\e)p{e\x)de (i) 

where 9 is the parameter vector of the model. 

This equation says that our prediction is going to be made up of a weighted average of predictions p{y\0) 
from models with different parameter values, weighted by the probability of each value of the parameter 
given the data, p{9\x). Using Bayes theorem, we can factorise p{9\x)^ giving: 

p{y\x) cx j p(y\e)p{x\9)p{e)de (2) 

p{x\9) is known as the likelihood (in both classical and Bayesian statistics) a nd can be evalua t ed by 
comparing the performance of the various models with data. p{9) is the prior. iTomassini et al.l ( 20071 ) 
and others have used subjective priors, based on expert judgement. To minimise the arbitrariness, or 
subjectiveness, of our forecasts we, however, would like to choose the prior in a non-arbitrary way. Setting 
the prior to a constant is not an option, since a constant in one coordinate system may not be a constant 
in another coordinate system. The only practical solution currently available is Jeffreys' Prior, defined 
as constant for parameters that represent a shift, 1/cr for parameters a that represent a scaling, and for 
other more general parameters 9 (where 9 can be either a single parameter, or a parameter vector) as: 



p{9) = J-det 



E 



/ d'^lnp 
\d9~dfj 



(3) 



where p = p{x\9)^ and the expectation is over all possible values of x. We note as a warning to readers who 
may not have come across Jeffreys' Prior before that this expression is somewhat difficult to understand. 
In particular we note that the quantity p{x\9) is the likelihood function of the model for arbitrary x, as 
distinct from the p{x\9) that occurs in equation [5] above, which is the same likelihood function, but with 
X set to the observed values. The expectation operator E is an integral over all possible values for x that 



could have occurred in the past (i.e. is a typical classical statistical expectation). It does not commute 
with the derivative of In p. 

Jeffreys' prior was originally presented in IJeffrevi ( 1946f) . and has been widely used since. The main 
attraction of Jeffreys' Prior is that it has the property that it is invariant under coordinate transformation 
of 9: the final prediction does not depend on the coordinates 9 that are chosen to parametrise the model. 
The proof of this is standard, but typically only given in very abbreviated form. For completeness, and 
clarity, we include the proof in the appendix, both in the single parameter form (in appendix 1) and in the 
multiple parameter form (in appendix 2) . Jeffreys' Prior also has many o ther interesting properties, that 
have been widely discussed in the statistics literature (see, for example, Bernardo and SmithI ( 1993l )l. 



3 Approximations to Jeffreys' Priors for use in climate mod- 
elling 

How, then, might we evaluate Jeffreys' prior for a climate model? First, we note that Jeffreys' prior is 
only a function of the model, and not of the observational data. So evaluating Jeffreys' prior is 'simply' 
going to be a question of running the climate model a number of times, in the right way, and processing 
the output. Given careful experimental design, the integrations needed to calculate Jeffreys' Prior could 
be the same as those needed to calculate the likelihood term p{x\9) in equation [21 thus minimizing the 
computational effort required. The obvious brute-force approach to evaluating Jeffreys' prior would then 
be: 



Run initial condition ensembles on a parameter grid to estimate p{x\9) (with one initial condition 
ensemble for each value of 9). 



u 1 

Numerically differentiate p{x\9) to give — ^ 



0^ Inp 



• Numerically take the expectation, to give E 

• Take the square root, at each value of 9. 

The use of emulators (aka response surfaces) can probably help in the estimation of p{x\B), but never- 
theless this approach is likely to be computationally challenging, given the need to produce an estimate 
of the entire distribution of p{x) at each value for 9, and the large ensemble sizes this implies, and at this 
point it would be tempting to be put off. However, we believe that there is a simple approximation that 
makes this potentially feasible. 



3.1 The assumption of normality 

The approximation that we propose to make the Jeffreys' Prior more tractable is to assume normality 
(aka Gaussianity) for the distribution for x. We can then write p{x\9^ = p(x|/i, ct^, C), where [i = fi(9) and 
(T^ = u^{9) are vectors of ensemble means and ensemble variances, and C = C{9) is a matrix of correlation 
coefficients. The derivative terms in the definition of the Jeffreys' Prior then become derivatives of ^, 
(7 and C, rather than derivatives in Inp. These new derivatives can be evaluated from numerical model 
integrations with much smaller ensembles than would be needed to evaluate the derivatives in Inp. 
We believe this approximation is reasonable, since most climate models are validated against monthly, 
seasonal, annual or even decadal mean data, and such data is typically close to normally distributed. 



3.2 The assumption of independence 

Under the assumption of normality we believe it may be possible to write a closed-form expression for 
Jeffreys' Prior, although it is somewhat difficult (and is a work in progress). To simplify the problem, 
therefore, we also assume that the data used to validate the climate model are independent. Whether 
this is true or not will vary from case to case, and depends on exactly what validation data is used and 
at what time intervals. But if true, Jeffreys' Prior can be written very simply, and is derived below. To 
make the derivation easy to follow we derive four cases, in terms of increasing complexity, leading up to 
the most general case. 



3.2.1 Single observation, single parameter 

In this (artificially simple) case the probability p{x\9) is given by: 



p{x\e) = -^cxp ) (4) 



V 2cr2 

where x is the single observation we are validating against, 9 is the single parameter in the climate model, 
and fi{9) and a{9) are the ensemble mean and ensemble standard deviation of initial condition ensembles 
as a function of 9. 
This gives: 

lnp = -ln\/2;^-lnc7- (5) 

2(7 

Taking first derivatives wrt the parameter 9 gives: 

dlnp Ida {x — fi)^ da {x — fi) dfi 

89 " ~ad9^ 09^ a"^ 09 

Taking second derivatives gives: 

d^lnp Id'^a ^ 1 f da^ 



89"^ a 96*2 a'^\d9 



{x - iJ.)'^ d'^a 'A{x - n)"^ f da\^ 2{x - /jI) da d/j. 



89^ a^ \d9 J a^ 89 89 



^{x — fj.) 8'^iJL 2{x — fi) 8fj,8a If 8^ 



2 



2 89"^ a^ 89 89 a'^ \89 , 
Taking expectations over x, and using the defintions of /x and cr, which imply that: 



this reduces this to: 



Jeffreys' Prior is thus given by: 



(6) 



E{x-ijl) = (8) 
E{x-nf = a^ (9) 



d'^\n.p\ 2 [ da\^ 1 f dfj,^ ^ 



One might additionally make the assumption that this expression is dominated by variations in the mean 
rather than in the standard deviation, in which case this simplifies further to: 



p{9) = - 
a 



8^ 



86 



(13) 



3.2.2 Multiple observations, single parameter 



This case is very similar to the previous case, but slightly more realistic in that we now have n independent 
observations, rather than just 1: 



p{x\e) 

Inp 
d\np 

de 



n ^ 

n 



=exp 



86 



erf 



^ 1 1 /9a, 



+ 



+ 



80 



86 



2{xi — jii) 8ai 8 Hi 



(14) 
(15) 
(16) 
(17) 



{xi - Hi) 8^ Hi 2{x^ - Hi) 8Hi 8ai 1 / 8h, 



9612 



86 86 af \ 86 



af 86 86 

2 



E 



52 Inp 
9612 



2 / 9(7,; \^ If 8Hi ^ ^ 



Ez, I uui \ 1 I UHi 

~^[~d6l ~^\~86 



a? V 86 



p{6) 



^ ^ /9a, y 1 /9Mi ^ ' 
\\^,<Tf\86) + af \ 86 



In the case in which a is assumed constant this becomes: 

p{6) = 



1 f 8h 



89 



(18) 
(19) 

(20) 



3.2.3 Single observation, multiple parameters 

This case is very similar to the single observation single parameter case, but now involves derivatives wrt 
pairs of parameters. For clarity, we start by writing a pair of parameters as {6, (f)): 
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(25) 



If we now switch notation from using the two parameters {9, </>) to multiple parameters (^1, 6m) Jeffreys' 
Prior is given by: 



p{6) 



-dctE 



92 Inp 
89id9i 



2 da 8a 1 9p 9/i 
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(26) 
(27) 



In the case where a is assumed constant this becomes: 



3.2.4 Multiple observations, multiple parameters 

This is the general case, and is the first case that could be applied to real climate models. Once again, 
we initially consider a pair of parameters (/>) initially. 
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If we again switch notation from using the two parameters {0,^) to multiple parameters (^i,...,^^) 
Jeffreys' Prior in this case is given by: 



p{0) 



'-detE 



92 \np 
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(35) 
(36) 



In the case where a is assumed constant this becomes: 

P(9) 



( ^ 1 8Hi d^i 



(37) 



vi=l 



4 Summary and Discussion 

We have argued that Jeffreys' Prior may be a useful way to create probabilistic forecasts of future climate 
from numerical climate models, since it is the least arbitrary choice of prior, and hence reduces the 
potential for argument about which prior is likely the best choice. It also allows for honest back-testing 
of climate models, unlike subjective priors. 

Computing Jeffreys' priors for the general case of arbitrary distributions of modelled and observed vari- 
ables is likely to be computationally too demanding, given the computational cost of climate models. 
However, we have argued that by assuming that the variables against which the model is validated 
are Gaussian the problem becomes tractable. Using this approximation computing the Jeffreys' prior 
then involves computing derivatives of the mean, variance and correlation coefficients of initial condition 



ensembles with respect to the parameters. Under the further assumption that the observations are inde- 
pendent the correlation matrix becomes diagonal and the computation of the prior reduces to evaluation 
of first derivatives of the mean and variance of initial condition ensembles. This can be simplified even 
further by assuming that the variance is roughly constant, in which case the prior becomes a function of 
the sensitivity of the ensemble mean to variations in the model parameters. 
There are number of areas of further work, and a number of outstanding questions. 
The most important next step is to attempt to apply Jeffreys' Prior, as approximate above, to climate 
model results. We are trying this both using a simple energy balance model, and a fully complex climate 
model. There are many practical questions related to implementation that we have not discussed here. 
It would also be useful, if possible, to derive closed-form solutions for the case of correlated observations. 
There are many outstanding questions. We are particularly interested in the question of whether to use 
what we call a deterministic or stochastic approach to experimental design for climate model integrations. 
We distinguish between these approaches as follows. Consider an experiment in which we use n initial 
condition ensembles (each for fixed parameters), each with m members. In the limiting case as m becomes 
very large, the initial condition uncertainty will disappear entirely from the ensemble means and variances, 
and the ensemble means and variances become a purely deterministic function of the parameters. Using 
large values of m is therefore what we call the deterministic approach. On the other hand, for m = 1, the 
ensemble mean is affected by initial condition uncertainty, and is highly stochastic (i.e. not a deterministic 
function of the model parameters any more, although it does contain a deterministic signal, obscured by 
the noise). Using m = 1 is what we call the stochastic (or stochastic parameter) approach. We believe 
that there are statistical reasons why the stochastic parameter approach is the most efhcient, based on the 
theory of experimental design, although the modelling of the variance response to changing parameters 
is certainly then more complex. 

Another set of related oustanding questions relates to emulators. It is now well accepted that emulators 
can, in many cases, improve probabilistic predictions, including those from climate models. But can 
emulators also help in evaluating the Jeffreys' Prior? Probably, but questions remain. For instance: if an 
emulator is to be used to help evaluate equation I37i then should the emulator be applied before or after 
taking the derivative? 

A Proof that Jeffreys' Prior is the same under coordinate trans- 
formations, for a single parameter 

If there is just a single parameter 9, then Jeffreys' Prior is defined as: 




(38) 
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First, we will show that 
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To show this, we note that 
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Integrating over all x, gives: 
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Given this result, Jeffreys' Prior can then be written as: 

P{0) = 




- il (^) 

Using this result a prediction based on Jeffreys' Prior 
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If we now change variables from to (p, and apply standard rules for changing variables, we find, for the 
first part of the integrand: 



For the second part of the integrand: 



For the third part of the integrand: 



p{x\6) = p{x\i 



dlnp 

de 



■ d\np 



E 



dlnp 

de 

dlnp 

de 



de 

a 

de 



dtp\^ f dlnp^ ^ 



d(l) 



E 



dlnp 

de 



d4>\ p 



^\\E 

de 



dlnp 
d<l) 

dlnp 
d(p 



and for the fourth part of the integrand: 
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Putting this all together, the prediction becomes: 
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and so: 
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We see that we would have achieved the same result had we parametrised using (p in the first place. In 
other words it does not matter which coordinates we choose: the prediction will always be the same. 

B Proof that Jeffreys' Prior is the same under coordinate trans- 
formations, for multiple parameters 

For multiple parameters 6 Jeffreys' Prior is defined as: 
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Integrating over all a;, gives: 
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Given this result, Jeffreys' Prior can then be written as: 

p{e) = JdetiE 
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Using this result a prediction based on Jeffreys' Prior 
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If we now change variables from to <j), and apply standard rules for changing variables, we find, for the 
first part of the integrand: 



p{y\0) = p{y\(t>) 



(69) 



For the second part of the integrand: 

p{x\e) = p{x\4>) (70) 

For the third part of the integrand: 
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(where J {9^ (p) is the Jacobian determinant) 
and for the fourth part of the integrand: 
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Putting this all together, the prediction becomes: 
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We see that also in this case we would have achieved the same result had we parametrised using ( 
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